NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Towards Few-shot Chemical Reaction Outcome Prediction

https://doi.org/10.1145/3746252.3761236

Shen, Yili; Tian, Yijun; Ju, Cheng-Wei; Wiest, Olaf; Zhang, Xiangliang (November 2025, ACM)

Accurate chemical reaction prediction is essential for drug discovery and synthetic planning. However, this task becomes particularly challenging in low-data scenarios, where novel reaction types lack sufficient training examples. To address this challenge, we propose FewRxn, a novel model-agnostic few-shot reaction prediction framework that enables rapid adaptation to unseen reaction types using only a few training samples. FewRxn integrates several key innovations, including segmentation masks for enhanced reactant representation, fingerprint embeddings for richer molecular context, and task-aware meta-learning for effective knowledge transfer. Through extensive evaluations, FewRxn achieves state-of-the-art accuracy in few-shot settings, significantly outperforming traditional fine-tuning methods. Additionally, our work provides insights into the impact of molecular representations on reaction knowledge transfer, demonstrating that knowledge captured under molecular graph-based formulation consistently outperforms those learned in forms of SMILES generation in few-shot learning.
more » « less
Free, publicly-accessible full text available November 10, 2026
Improving reaction prediction through chemically aware transfer learning

https://doi.org/10.1039/d4dd00412d

Keto, Angus; Guo, Taicheng; Gönnheimer, Nils; Zhang, Xiangliang; Krenske, Elizabeth H; Wiest, Olaf (May 2025, Digital Discovery)

Pretraining of NERF models on chemically related mechanisms significantly improves the performance compared to pretraining by larger, mechanistically dissimilar reaction datasets.
more » « less
Free, publicly-accessible full text available May 14, 2026
Conformation Dependent Features of Bisphosphine Ligands

https://doi.org/10.1021/acs.joc.5c01682

Stenfors, Brock A; Cadge, Jamie A; Aikonen, Santeri; Luchini, Guilian; Wahlers, Jessica; Koh, Kevin; Muuronen, Mikko; Menche, Maximilian; Pfeifle, Mark; Keto, Angus; et al (September 2025, The Journal of Organic Chemistry)

Free, publicly-accessible full text available September 22, 2026
Application of Large Language Models in Chemistry Reaction Data Extraction and Cleaning

https://doi.org/10.1145/3627673.3679874

Huang, Xiaobao; Surve, Mihir; Liu, Yuhan; Luo, Tengfei; Wiest, Olaf; Zhang, Xiangliang; Chawla, Nitesh V (October 2024, ACM)

Chemical reaction data has existed and still largely exists in unstructured forms. But curating such information into datasets suitable for tasks such as yield and reaction outcome prediction is impractical via manual curation and not possible to automate through programmatic means alone. Large language models (LLMs) have emerged as potent tools, showcasing remarkable capabilities in processing textual information and therefore could be extremely useful in automating this process. To address the challenge of unstructured data, we manually curated a dataset of structured chemical reaction data to fine-tune and evaluate LLMs. We propose a paradigm that leverages prompt-tuning, fine-tuning techniques, and a verifier to check the extracted information. We evaluate the capabilities of various LLMs, including LLAMA-2 and GPT models with different parameter counts, on the data extraction task. Our results show that prompt tuning of GPT-4 yields the best accuracy and evaluation results. Fine-tuning LLAMA-2 models with hundreds of samples does enable them and organize scientific material according to user-defined schemas better though. This workflow shows an adaptable approach for chemical reaction data extraction but also highlights the challenges associated with nuance in chemical information. We open-sourced our code at GitHub.
more » « less
Full Text Available
Are we Making Much Progress? Revisiting Chemical Reaction Yield Prediction from an Imbalanced Regression Perspective

https://doi.org/10.1145/3589335.3651470

Ma, Yihong; Huang, Xiaobao; Nan, Bozhao; Moniz, Nuno; Zhang, Xiangliang; Wiest, Olaf; Chawla, Nitesh V (May 2024, ACM)

Full Text Available
Data-Efficient, Chemistry-Aware Machine Learning Predictions of Diels–Alder Reaction Outcomes

https://doi.org/10.1021/jacs.4c03131

Keto, Angus; Guo, Taicheng; Underdue, Morgan; Stuyver, Thijs; Coley, Connor W; Zhang, Xiangliang; Krenske, Elizabeth H; Wiest, Olaf (June 2024, Journal of the American Chemical Society)

Full Text Available
Mutant induced neurons and humanized mice enable identification of Niemann-Pick type C1 proteostatic therapies

https://doi.org/10.1172/jci.insight.179525

Azaria, Ruth D; Correia, Adele B; Schache, Kylie J; Zapata, Manuela; Pathmasiri, Koralege C; Mohanty, Varshasnata; Nannapaneni, Dharma T; Ashfeld, Brandon L; Helquist, Paul; Wiest, Olaf; et al (October 2024, JCI Insight)

Full Text Available
Interplay of Computation and Experiment in Enantioselective Catalysis: Rationalization, Prediction, and─Correction?

https://doi.org/10.1021/acscatal.3c03921

Maloney, Michael P.; Stenfors, Brock A.; Helquist, Paul; Norrby, Per-Ola; Wiest, Olaf (November 2023, ACS Catalysis)

The application of computational methods in enantioselective catalysis has evolved from the rationalization of the observed stereochemical outcome to their prediction and application to the design of chiral ligands. This Perspective provides an overview of the current methods used, ranging from atomistic modeling of the transition structures involved to correlation-based methods with particular emphasis placed on the Q2MM/CatVS method. Using three enantioselective palladium-catalyzed reactions, namely, the conjugate addition of arylboronic acids to enones, the enantioselective redox relay Heck reaction, and the Tsuji–Trost allylic amination as case studies, we argue that computational methods have become truly equal partners to experimental studies in that, in some cases, they are able to correct published stereochemical assignments. Finally, the consequences of this approach to data-driven methods are discussed.
more » « less
Full Text Available
A Proline-Squaraine Ligand Framework (Pro-SqEB) for Stereoselective Rhodium(II)-Catalyzed Cyclopropanations

https://doi.org/10.1021/acs.orglett.3c03344

Bacher, Emily P.; Twiringiyimana, Raïssa; Rodriguez, Kevin X.; Wilson, Renita A.; Bodnar, Alexandra K.; O’Connell, Ryan; Toni, Tiffany A.; Eckert, Kaitlyn E.; Wiest, Olaf; Ashfeld, Brandon L. (November 2023, Organic Letters)

A proline-squaraine ligand (Pro-SqEB) that demonstrates high levels of stereoselectivity in olefin cyclopropanations when anchored to a Rh2 II scaffold is introduced. High yields and enantioselectivities were achieved in the cyclopropanation of alkenes with diazo compounds in the presence of Rh2(Pro- SqEB)4. Notably, the unique electronic and steric design of this catalyst enabled the use of polar solvents that are otherwise incompatible with most RhII complexes.
more » « less
Full Text Available
Negative Data in Data Sets for Machine Learning Training

https://doi.org/10.1021/acs.joc.3c00844

Maloney, Michael P.; Coley, Connor W.; Genheden, Samuel; Carson, Nessa; Helquist, Paul; Norrby, Per-Ola; Wiest, Olaf (May 2023, The Journal of Organic Chemistry)

Full Text Available

« Prev Next »

Search for: All records